Burke County
Closing the Performance Gap Between AI and Radiologists in Chest X-Ray Reporting
Sharma, Harshita, Reynolds, Maxwell C., Salvatelli, Valentina, Sykes, Anne-Marie G., Horst, Kelly K., Schwaighofer, Anton, Ilse, Maximilian, Melnichenko, Olesya, Bond-Taylor, Sam, Pérez-García, Fernando, Mugu, Vamshi K., Chan, Alex, Colak, Ceylan, Swartz, Shelby A., Nashawaty, Motassem B., Gonzalez, Austin J., Ouellette, Heather A., Erdal, Selnur B., Schueler, Beth A., Wetscherek, Maria T., Codella, Noel, Jain, Mohit, Bannur, Shruthi, Bouzid, Kenza, Castro, Daniel C., Hyland, Stephanie, Korfiatis, Panos, Khandelwal, Ashish, Alvarez-Valle, Javier
AI-assisted report generation offers the opportunity to reduce radiologists' workload stemming from expanded screening guidelines, complex cases and workforce shortages, while maintaining diagnostic accuracy. In addition to describing pathological findings in chest X-ray reports, interpreting lines and tubes (L&T) is demanding and repetitive for radiologists, especially with high patient volumes. We introduce MAIRA-X, a clinically evaluated multimodal AI model for longitudinal chest X-ray (CXR) report generation, that encompasses both clinical findings and L&T reporting. Developed using a large-scale, multi-site, longitudinal dataset of 3.1 million studies (comprising 6 million images from 806k patients) from Mayo Clinic, MAIRA-X was evaluated on three holdout datasets and the public MIMIC-CXR dataset, where it significantly improved AI-generated reports over the state of the art on lexical quality, clinical correctness, and L&T-related elements. A novel L&T-specific metrics framework was developed to assess accuracy in reporting attributes such as type, longitudinal change and placement. A first-of-its-kind retrospective user evaluation study was conducted with nine radiologists of varying experience, who blindly reviewed 600 studies from distinct subjects. The user study found comparable rates of critical errors (3.0% for original vs. 4.6% for AI-generated reports) and a similar rate of acceptable sentences (97.8% for original vs. 97.4% for AI-generated reports), marking a significant improvement over prior user studies with larger gaps and higher error rates. Our results suggest that MAIRA-X can effectively assist radiologists, particularly in high-volume clinical settings.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > United States > North Dakota > Burke County (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (3 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine > Nuclear Medicine (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
- Information Technology > Sensing and Signal Processing > Image Processing (0.92)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.67)
AI-Mediated Communication Reshapes Social Structure in Opinion-Diverse Groups
Huq, Faria, Claggett, Elijah L., Shirado, Hirokazu
Group segregation or cohesion can emerge from micro-level communication, and AI-assisted messaging may shape this process. Here, we report a preregistered online experiment (N = 557 across 60 sessions) in which participants discussed controversial political topics over multiple rounds and could freely change groups. Some participants received real-time message suggestions from a large language model (LLM), either personalized to their stance ("individual assistance") or incorporating their group members' perspectives ("relational assistance"). We find that small variations in AI-mediated communication cascade into macro-level differences in group composition. Participants with individual assistance send more messages and show greater stance-based clustering, whereas those with relational assistance use more receptive language and form more heterogeneous ties. Hybrid expressive processes--jointly produced by humans and AI--can reshape collective organization. The patterns of structural division and cohesion depend on how AI incorporates users' interaction context. Understanding how micro-level communication patterns accumulate into macro-level group segregation or cohesion is a central question in social and behavioral science [1-3]. Conversations across differences are often asymmetric: people find it difficult to engage constructively with those who hold opposing views [4, 5], and stereotypes bias perceptions of outgroup members [6]. Online platforms can intensify these dynamics through lowered inhibitions [9], emotion-amplified diffusion [10], and algorithmic or behavioral clustering processes [11-13]. While the forces that produce social division are well theorized and empirically documented, far less is known about the micro-level conversational mechanisms that can instead generate cohesion in ideollogically diverse groups [14-16].
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- (4 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Questionnaire & Opinion Survey (1.00)
- Law (0.68)
- Government > Regional Government (0.47)
- Government > Immigration & Customs (0.46)
- Africa > Cameroon > Gulf of Guinea (0.04)
- North America > United States > Texas > Kleberg County (0.04)
- North America > United States > Texas > Chambers County (0.04)
- (10 more...)
- Research Report > Experimental Study (1.00)
- Overview (0.92)
- Workflow (0.67)
- North America > United States > California > Santa Clara County > Stanford (0.05)
- North America > United States > California > Santa Clara County > Palo Alto (0.05)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- (3 more...)
- Health & Medicine > Therapeutic Area (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (0.93)
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States > North Dakota > Burke County (0.04)
- North America > United States > Maryland > Baltimore (0.04)
- (5 more...)
- Leisure & Entertainment > Sports (1.00)
- Transportation (0.69)
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- Asia > China (0.04)
- (7 more...)
- Leisure & Entertainment > Sports (1.00)
- Transportation (0.93)
- Information Technology (0.92)
- Government (0.68)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
OnPrem.LLM: A Privacy-Conscious Document Intelligence Toolkit
We present OnPrem$.$LLM, a Python-based toolkit for applying large language models (LLMs) to sensitive, non-public data in offline or restricted environments. The system is designed for privacy-preserving use cases and provides prebuilt pipelines for document processing and storage, retrieval-augmented generation (RAG), information extraction, summarization, classification, and prompt/output processing with minimal configuration. OnPrem$.$LLM supports multiple LLM backends -- including llama$.$cpp, Ollama, vLLM, and Hugging Face Transformers -- with quantized model support, GPU acceleration, and seamless backend switching. Although designed for fully local execution, OnPrem$.$LLM also supports integration with a wide range of cloud LLM providers when permitted, enabling hybrid deployments that balance performance with data control. A no-code web interface extends accessibility to non-technical users.
- North America > United States > Virginia > Alexandria County > Alexandria (0.04)
- North America > United States > North Dakota > Burke County (0.04)
Jackal: A Real-World Execution-Based Benchmark Evaluating Large Language Models on Text-to-JQL Tasks
Frank, Kevin, Gulati, Anmol, Lumer, Elias, Campagna, Sindy, Subbiah, Vamse Kumar
Enterprise teams rely on the Jira Query Language (JQL) to retrieve and filter issues from Jira. Yet, to our knowledge, there is no open, real-world, execution-based benchmark for mapping natural language queries to JQL. We introduce Jackal, a novel, large-scale text-to-JQL benchmark comprising 100,000 natural language (NL) requests paired with validated JQL queries and execution-based results on a live Jira instance with over 200,000 issues. To reflect real-world usage, each JQL query is associated with four types of user requests: (i) Long NL, (ii) Short NL, (iii) Semantically Similar, and (iv) Semantically Exact. We release Jackal, a corpus of 100,000 text-to-JQL pairs, together with an execution-based scoring toolkit, and a static snapshot of the evaluated Jira instance for reproducibility. We report text-to-JQL results on 23 Large Language Models (LLMs) spanning parameter sizes, open and closed source models, across execution accuracy, exact match, and canonical exact match. In this paper, we report results on Jackal-5K, a 5,000-pair subset of Jackal. On Jackal-5K, the best overall model (Gemini 2.5 Pro) achieves only 60.3% execution accuracy averaged equally across four user request types. Performance varies significantly across user request types: (i) Long NL (86.0%), (ii) Short NL (35.7%), (iii) Semantically Similar (22.7%), and (iv) Semantically Exact (99.3%). By benchmarking LLMs on their ability to produce correct and executable JQL queries, Jackal exposes the limitations of current state-of-the-art LLMs and sets a new, execution-based challenge for future research in Jira enterprise data.
- North America > United States > North Dakota > Burke County (0.05)
- North America > United States > Pennsylvania (0.04)
- North America > United States > Oregon > Multnomah County > Portland (0.04)
- (2 more...)
MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence
Yang, Sihan, Xu, Runsen, Xie, Yiman, Yang, Sizhe, Li, Mo, Lin, Jingli, Zhu, Chenming, Chen, Xiaochen, Duan, Haodong, Yue, Xiangyu, Lin, Dahua, Wang, Tai, Pang, Jiangmiao
Spatial intelligence is essential for multimodal large language models (MLLMs) operating in the complex physical world. Existing benchmarks, however, probe only single-image relations and thus fail to assess the multi-image spatial reasoning that real-world deployments demand. We introduce MMSI-Bench, a VQA benchmark dedicated to multi-image spatial intelligence. Six 3D-vision researchers spent more than 300 hours meticulously crafting 1,000 challenging, unambiguous multiple-choice questions from over 120,000 images, each paired with carefully designed distractors and a step-by-step reasoning process. We conduct extensive experiments and thoroughly evaluate 34 open-source and proprietary MLLMs, observing a wide gap: the strongest open-source model attains roughly 30% accuracy and OpenAI's o3 reasoning model reaches 40%, while humans score 97%. These results underscore the challenging nature of MMSI-Bench and the substantial headroom for future research. Leveraging the annotated reasoning processes, we also provide an automated error analysis pipeline that diagnoses four dominant failure modes, including (1) grounding errors, (2) overlap-matching and scene-reconstruction errors, (3) situation-transformation reasoning errors, and (4) spatial-logic errors, offering valuable insights for advancing multi-image spatial intelligence. Project page: https://runsenxu.com/projects/MMSI_Bench .
- Asia > China > Shanghai > Shanghai (0.04)
- Asia > China > Hong Kong (0.04)
- North America > United States > North Dakota > Burke County (0.04)
- (2 more...)
- Information Technology (0.46)
- Law (0.46)
Data Scaling Laws for Radiology Foundation Models
Ilse, Maximilian, Sharma, Harshita, Schwaighofer, Anton, Bond-Taylor, Sam, Pérez-García, Fernando, Melnichenko, Olesya, Sykes, Anne-Marie G., Horst, Kelly K., Khandelwal, Ashish, Reynolds, Maxwell, Wetscherek, Maria T., Codella, Noel C. F., Alvarez-Valle, Javier, Panagiotis, Korfiatis, Salvatelli, Valentina
Foundation vision encoders such as CLIP and DINOv2, trained on web-scale data, exhibit strong transfer performance across tasks and datasets. However, medical imaging foundation models remain constrained by smaller datasets, limiting our understanding of how data scale and pretraining paradigms affect performance in this setting. In this work, we systematically study continual pretraining of two vision encoders, MedImageInsight (MI2) and RAD-DINO representing the two major encoder paradigms CLIP and DINOv2, on up to 3.5M chest x-rays from a single institution, holding compute and evaluation protocols constant. We evaluate on classification (radiology findings, lines and tubes), segmentation (lines and tubes), and radiology report generation. While prior work has primarily focused on tasks related to radiology findings, we include lines and tubes tasks to counterbalance this bias and evaluate a model's ability to extract features that preserve continuity along elongated structures. Our experiments show that MI2 scales more effectively for finding-related tasks, while RAD-DINO is stronger on tube-related tasks. Surprisingly, continually pretraining MI2 with both reports and structured labels using UniCL improves performance, underscoring the value of structured supervision at scale. We further show that for some tasks, as few as 30k in-domain samples are sufficient to surpass open-weights foundation models. These results highlight the utility of center-specific continual pretraining, enabling medical institutions to derive significant performance gains by utilizing in-domain data.
- North America > United States > North Dakota > Burke County (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
- Health & Medicine > Nuclear Medicine (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)